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ABSTRACT 

The Milky Way Project citizen science initiative recently increased the number of known infrared 
i^ I bubbles in the inner Galactic plane by an order of magnitude compared to previous studies. We 

present a detailed statistical analysis of this dataset with the Red MSX Source catalog of massive 
young stellar sources to investigate the association of these bubbles with massive star formation. We 
particularly address the question of massive triggered star formation near infrared bubbles. We find 
a strong positional correlation of massive young stellar objects (MYSOs) and H ll regions with Milky 
Way Project bubbles at separations of < 2 bubble radii. As bubble sizes increase, a statistically 
J^ ' significant over density of massive young sources emerges in the region of the bubble rims, possibly 

indicating the occurrence of triggered star formation. Based on numbers of bubble-associated RMS 
sources we find that 67±3% of MYSOs and (ultra) compact H ll regions appear associated with a 
bubble. We estimate that approximately 22±2% of massive young stars may have formed as a result of 
feedback from expanding H ll regions. Using MYSO-bubble correlations, we serendipitously recovered 
the location of the recently discovered massive cluster Mercer 81, suggesting the potential of such 
analyses for discovery of heavily extincted distant clusters. 
Subject headings: Infrared: ISM; ISM: bubbles, HII regions; Stars: formation, massive 



kendrew@mpia.de 

^ ESQ, Karl-Schwarzschild-Strasse 2, 87548 Garching, Ger- 
many 

^ Harvard-Smithsonian Center for Astrophysics, 60 Garden 
Street, Cambridge, MA 02138, USA 

3 NSF Astronomy and Astrophysics Postdoctoral Fellow ^ Einstein Fellow 

^ Astronomy Department, Adler Planetarium, 1300 S. Lake ^ Department of Astronomy & Astrophysics, University of 

Shore Drive, Chicago, IL 60605, USA Chicago, 5640 S. Ellis Ave., Chicago, IL 60637, USA 



1. INTRODUCTION 

High-mass stars have a far-reaching effect on a galaxy's 
interstehar medium (ISM). Throughout their hfetimes 
their feedback shapes the surrounding cloud material. 
At the earliest stages of formation, powerful outflows en- 
ergize the natal molecular cloud, and once switched on, 
the UV radiation ionizes the young star's surroundings 
and carves out an H+-filled cavity. Ionizing radiation 
combines with stellar winds to clear out the surrounding 
gas and dust, forming bright shells, partial or complete 
bubbles. Upon their deaths as supernovae, massive stars 
inject as much energy into the ISM as they do during 
their lifetimes in the form of heating and shocks. These 
feedback mechanisms combine to shape and disrupt giant 
molecular clouds (GMCs), and eve ntually help regulate 
star formation on a Galactic scale (jHopkins et al.|]2QTTl 
lMatznedl2QQ2l ). 

1.1. Infrared bubbles 

Bright-rimmed bubbles around newly formed mas- 
sive stars and clusters are readily visible at infrared 
wavelengths, as the stars' UV radiation excites poly- 
cyclic aromatic hydrocarbons (PAHs) in the swept up 
shell, and hot dust surrounding the young stars ra- 
diates at mid-IR wavelengths. Large scale surveys 
with the Spitzer Space Te lescope, such as GLIMPSE at 
3.6/4.5/5.8/8.0 /am (Be niamin et al. 2003) and MIPS- 
GAL at 24/70 fim (;Carev et al.l [2009), and the re- 
cent ly^_coni2leted_ all^sky survey of the WISE mis- 
sion (Wright et ajj |2010D, are ideal datasets for iden- 
tifying such structures. i Churchwe ll et al.l (120061 120071) 
(C06, C07 hereafter) made a first attempt at catalogu- 
ing bubbles by visual identification in Spitzer/GLIMPSE 
images, identifying some 600 bubbles over the entire area 
covered by the survey (|/| < 65°, \b\ < 1°). Good 
correlation of IR bubbles with known H ll regions and 
relatively low contamination from supernova remnants 
(SNR), asymptotic giant branch (AGB) star bubbles and 
planetary nebulae (PNe) reported in the literature, indi- 
cates that bubbl es are a u seful tracer of star formation 
activity (Churchw ell et al . 2006; Deharveng et al. 2010). 

The Milky Way Project (Simpson et al. 2012, S12 here- 
after) recently produced an expanded catalog of 5106 in- 
frared bubbles from Spitzer imaging surveys. This order- 
of-magnitude increase in the number of known bubble 
objects presents a new opportunity to perform statistical 
studies of high-mass star formation on a Galactic scale. 
The sample is likely to be heterogeneous in nature but 
initial cross-matching with existing catalogs described by 
S12 indicates that many bubbles are associated with re- 
gions of massive star formation. 

1.2. Triggered star formation 

Triggered or sequential star formation is the pro- 
cess whereby feedback from energetic events sparks 
a second generation of star formation. Pro- 

posed cause s of tr i^^erin^ include supernova ex- 
plosions (Phillips et a l. 2009) ; formati o n of massive 
stars or clusters lElmegre en fc Ladal 11977 ): mas- 
sive stellar winds ( Castor et al.l Il975f ): protostellar 
outflow s (|Ba rsonv et al] Il998f ): and spiral density 
waves ("Roberts 1969) and galaxy-galaxy tidal interac- 
tions ( Woods et al.j ,200 6) on Galactic scales. If preva- 



lent in galaxies, triggering could provide a mechanism 
for the propagation of star formation through the galac- 
tic ISM, potentially supporting a mode of self-sustaining 
star formation. Comprehensive reviews of the the- 
or y of triggered star formation were recently given 
bv IZinnecker fc Yorkd (|2007) and Elmegreen (2011). 

Two scenarios for sequential star formation are com- 
monly proposed in the literature. lElmegreen fc Ladal 
(|1977l ) originally proposed the "collect and collapse" 
process, which occurs when neutral gas becomes swept 
up and compressed between the expanding shock and 
ionization fronts from the H II region, causing it 

to fragment and collap se. The theory is further 

described by lElmegreenI ()1998f ). The process acts 



on large spatial scales along the bubble rim and 
the fragmentation is thought to result in the forma- 
tion o f massive f ragments (>7 M©; Whitwor th et al.l 
119941 : iDale et al] [2011). The mechanism has been 
extens i vely t e sted, both with si n iulati ons (jDale et al.l 
I2007ai 120091 : IGritschneder et al.l I2009D and observa- 
tions of shells around known star-formi ng regions (e.g. 
'Deharveng et al."2005'; ' Deharveng et al.l 120101 : fBik et al.l 
2010; Brand et al. 2011). 

Radiatively driven implosion (RDI) is believed to act 
on smaller scales when pre-existing condensations in- 
side the clumpy molecular cloud, often in narrow pil- 
lars or globules, are compressed and driven to col- 
lapse by the pressure from the ionized material. In 
this case, the timescale for star formation to oc- 
cur is determined by the ionizing flux and the time 
taken for t he pressure disturbance t o reach th e con- 
densation (^Lefloch fc Lazare^ 11994 lElmegreenI 120111 : 
iMiao et al. l 2009; Bisbas et al. 201 B). The RDI pro- 
cess has been validated by smoothed particle hydrody- 
namic (SPH) simulations (Dale e t al. 2007b); hydrody - 
namical simulations presented by lErcolano et all ()2012l ) 
of ionizing feedback around massive stars are success- 
fully able to reproduce observed morphologies of H ll re- 
gion pillars and shells w here RDI is thought to take 
place. iLee fc ChenI (j2007f ) present observational evidence 
of RDI near OB associations, flnding that the process 
preferentially produces low- and intermediate-mass stars. 

The theoretical picture of triggered star formation is 
complicated by the fact that feedback from OB stars is 
known to have destructive effects on further star form- 
ing activity in the cloud. Expulsion of the gas by radi- 
ation pressure and ionization may halt any ongoing ac- 
cretion around nearby young stars, preventing the for- 
mation of new stars, limiting their ma ss, or leading to 
the d i sruption of the p arent clu s ter (iBoilv fc Kroupa 



2003 : JDale et al.' '2005; 'Matzner' '2002'; 'Hopki ns et al 
201l|; [Bastian fc Goodwin 2006). Matzner (2002) found 



H II regions to be the main source of turbulent energy 
injection in dense giant molecular clouds, which prevents 
further protostellar collapse. The effects of ionizing feed- 
back from young clusters on the ISM have als o been chal- 
lenged in the context of triggering by Dal e fc Bonnelll 
(2011). This calls into question the entire scenario of ex- 
panding H II regions driving massive star formation in 
the host cloud. 

1.2.1. Observational evidence 

The challenge for observational evidence of triggered 
star formation is establishing the causality between the 



original and subsequent star formation: did the initial 
star formation event really cause the following genera- 
tion, or did the clearing of the cavity simply uncover 
pre-existing star forming clumps and cores in the cloud 
material? 

The majority of observational studies into triggered or 
sequential star formation near SNR or H ll regions take a 
phenomenological approach, combining multiple datasets 
(typically a combination of near-, mid- and far-infrared, 
millimeter and radio wavelengths) to calculate age, mass 
and luminosity of triggering and triggered sources, and 
kinematic properties of the young stars and the sur- 
rounding ISM. Evidence of triggering has thus been re- 
port ed near a number of kn own H ll reffl ons, e.g. Sh2- 
212 (iDeharveng et all 120081): R CW12Q ^avagno et^ 
I2010r ): W51a (iKang et al.ll2009r ). Such studies offer rea- 
sonably convincing evidence of triggered star formation, 
however frequently conclude with open questions and un- 
certainties. Furthermore, they cannot address the ques- 
tion of how significant triggered star formation is on 
Galactic scales. 

1.2.2. A statistical approach: YSO clustering near bubbles 

To address the uncer tainties inherent in obse rvations of 
individual H ll regions. [Thompson et al.l (|2Q12L T12 here- 
after) used a different approach. They performed a cor- 
relation analysis of bubble and YSO populations in the 
inner Galactic plane to investigate YSO clustering prop- 
erties in the vicinity of IR bubbles. With a well-studied 
sample of 2000 sources extracted from mid-infrared im- 
ages [8/12/14/21 ^m), the Red MSX Source (RMS) cat- 
alog (|Urquhart et al.ir2008l ) provides a suitable dataset of 
massive young stellar sources for such a study. 

Using the C06 catalog of bubbles from the GLIMPSE 
survey and a sample of massive YSOs (MYSOs) and 
ultra-compact H ll regions (UCHII) from the RMS cat- 
alog, they identify a statistically significant overdensity 
of MYSOs on the scale of 1 bubble Reff. If we assume 
triggering is real, their result allows an estimation of 
the prevalence of triggered star formation in the inner 
Galaxy. How much of the Galaxy's star forming activ- 
ity may have been sparked by preceding star formation 
euisodes? 

T12 find that 14% of their RMS MYSO/UCHII re- 
gion sample lie within two (bubble) radii from a bubble. 
Based on their observed overdensity, the authors esti- 
mate that the formation of >14% of MYSOs in the in- 
ner Galactic plane may have been triggered by feedback 
from nearby massive young stars or clusters. This how- 
ever assumes that all YSOs found near bubble rims are 
forming as a result of triggering. Because of the incom- 
pleteness of the COG bubble sample, they present this as 
a lower limit that may increase by a factor ~2 given a 
more complete bubble sample. With the first release of 
bubble data from MWP, such a sample is now available. 
In this paper we expand the analysis presented in T12 
to include the MWP bubbles, to investigate statistically 
the potential prevalence of triggered star formation on 
Galactic scales. 

The paper is organised as follows: in Section [2] we de- 
scribe the catalogs used for this analysis, their properties 
and limitations. Section [3] presents the method used for 
the correlation analysis, and how it was applied to the 
datasets. In Section [H the analysis results are presented: 



we first examine the auto-correlation properties of the 
RMS dataset used in the subsequent analyses. We show 
our reproduction of the T12 result using the Church- 
well/RMS catalogs, and then extend this to the larger 
MWP sample. We examine correlations for a number 
of subsamples to investigate the sensitivity of the result 
to a number of bubble parameters, notably size, thick- 
ness, and RMS source type. Limitations of the results 
and their implications of our findings on the prevalence 
of triggered star formation are discussed in Section [5l 

2. DATA CATALOGS 

This section describes the catalogs referenced in the 
work presented in this paper. The region of overlap be- 
tween the RMS, C06 and MWP catalogs covers 10° < 
|/| < 65°, \b\ < 1°, and for each dataset sources were 
selected within these Hmits only. Longitude and latitude 
distributions are plotted for all three catalogs together 
in Fig. [Hand El 

2.1. Milky Way Project bubbles 

The main focus of the analysis presented in this pa- 
per consists of the Milky Way project (MWP) Data Re- 
lease 1 (DRl) bubbles, presented in S12. These bubbles 
were identified by over 35,000 users of the MWP cit- 
izen s cience websitql, a p roject created by the Zooni- 
verse (|Fortson et al. I [20081 ). in RGB images from the 
Spitz er Space Telescope GLIMPSE and MIPSGAL sur- 
veys (BenjaniinelaD[2003[; [Carey et al.'"2009). The color 
composites were created from the 4.5/8.0/24.0/im im- 
ages over the coordinate range |/| < 65°, \b\ < 1.0°. 
Images were presented online to users, who were asked 
to draw the outlines of bubbles using an ellipse-drawing 
tool. From the inner and outer ellipse sizes, effective radii 
(Reff) and thicknesses (teff) were calculated as simple de- 
scriptive metrics for the bubbles, using the equations of 
C06: 
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where Rin, Rout are the inner and outer semi-major axes, 
and rin^ Tout the inner and outer semi- minor axes re- 
spectively (COG, SI 2). The minimum inner and outer 
diameters of the drawing tool cover 0.45' and 0.65', re- 
spectively. The minimum Reff corresponding to these 
values, following Equation [TJ is 0.27'. The catalog lists 
inner and outer diameters (or ellipse axes for non-circular 
bubbles), eccentricities and position angles. Data cata- 
logs for MWP are publicly available onlin^E- The site 
also includes a Data Explorer page that visualises the 
bubble data. 

MWP-DRl consists of two separate catalogs: the large 
and small bubbles. The small bubbles were not drawn as 
ellipses but with simple box shapes. They do not there- 
fore have listed thicknesses in the catalog, nor positional 
uncertainties. This study uses the combined large and 
small bubble sample where it overlaps with the region 
covered by the RMS sources (see Section [23]) . The sam- 
ple contains 4434 bubbles, of which 3260 are 'large' and 

^ http://www.milkywayproject.org 

^ http://www.milkywayproject.org/data 
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Figure 1. Distribution with galactic longitude of the three catalogs: RMS all young sources (green), C06 bubbles (red) and MWP bubbles 
(blue). Note that the region |/| < 10° was excluded as this is not covered by the RMS catalog. 



500 



400 



300 



MWP bubbles 
C06 bubbles 
RMS YSOs 



o 
o 



200 



100 



-0.5 0.0 

galactic latitude (deg) 

Figure 2. Galactic latitude distribution of the three catalogs: 
RMS all young sources (green), C06 bubbles (red) and MWP bub- 
bles (blue). Overplotted are the best-fit Gaussian functions, used 
for the catalog randomisations. 

1174 'small'. Fig. [3] shows the distribution of Reff of all 
bubbles, and of the large and small samples individu- 
ally. To avoid confusion with simple size descriptions, 
we identify these samples as the 'MWP-L' and 'MWP-S' 
bubbles. 

As noted in S12, several large-scale structural features 
of the Milky Way galaxy can be traced in the longitude 
distribution of the bubbles (Fig. [1]), most notably the 
Sagittarius arm near / = 50°, the Scutum arm near / = 
25 — 35° and / = —55°, and the Norma arm around 
/ = -30° (S12). 

In S12 the completeness of the catalog was estimated at 
>94%, based on the decline in bubble discovery rates over 
time and limitations of the clustering algorithm used to 
identify bubbles. This limit applies within the size range 
accessible with the classification tools: the minimum 
bubble Reff of 0.27' determined by the ellipse-drawing 
tool corresponds to 0.24 pc at 3 kpc, increasing to 1.2 pc 
at 15 kpc. The size of the largest bubble in the sample 
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Figure 3. Distribution of MWP bubble effective radii, in arcmin, 
for the full sample (blue line; 4434 bubbles), MWP-L (red line; 
3260 bubbles) and MWP-S (greenline; 1174 bubbles) subsamples. 
The distribution is truncated at 3^ for better legibility of the plot; 
269 bubbles have sizes beyond this range. The dashed line indicates 
the size of the MSX beam. 

measures 11.7' in Reff. The small bubbles are subject 
to a higher uncertainty in position and size because of 
the coarser drawing method used. Cross-matching with 
the catalog of H ll regions of I Anderson et al.] (|2Q11[ ) has 
allowed the distance to be determined to 189 bubbles in 
the full MWP sample. The distances, presented by S12, 
cov er the ra nge of 2.2 to 14 .4 kpc (Fig.|4]). We note that 
the l Ander son et al.^ (I 2Q11 I) catalog covers just a small 
longitude region and contains only sources identified as 
H II regions. The distances shown for this small subsam- 
ple of bubbles may thus not be representative of the full 
set. 

2.2. Churchwell bubbles 

C06 described their sample of 322 partial and closed 
rings visually identified in infrared images from the 




distance (kpc) 

Figure 4. Distribution of known distances to 189 MWP bubbles 
(17 9 MWP-L, blue 10 MWP-S, green) from cross-matching with 
the I Anderson et aLl |2011) catalog of H ll regions. 

GLIMPSE survey by a small number of authors, mark- 
ing the first attempt at cataloguing these objects. They 
found approximately 25% of the sample to be coincident 
with known H II regions, and 13% with open clusters; 
iDeharveng et al.l (|2Q1Q[ ) report a much higher associa- 
tion, in the range of 85-95%. Observational biases are 
described, indicating that bubble discovery can be im- 
proved using a larger number of independent visual in- 
spections of the images. Small bubbles in particular are 
thought to be lacking from the catalog. 

In contrast with the Milky Way Project, the Church- 
well bubble identifications did not use the MIPS GAL 
24 /im images. We refer to the Churchwell bubbles used 
in this paper as the 'C06' sample, which contains 315 
bubbles after limiting the number to the area covered by 
the MWP sample. C06 make only a rough estimate of 
the completeness of their catalog ( "on the order of 50% or 
less" ) , and indeed the bubble counts in the newer MWP 
catalog are a factor ~10 higher than C06's sample. The 
COG bubble sample is used only for benchmarking of our 
method used in this analysis, and verification of T12's 
recent result. Effective radii are defined according to 
Equation [H 

2.3. Red MSX Source catalog 

The Red MSX Source (R MS) catalog (jUrauhart et al.l 
120081 : iLumsden et al.|[2QQ2l ) contains around 2000 sources 
selected from mid-infrared images from the Mid- 
cours e Space Experiment (MSX) Satellite (Price et al.l 
120011) and near- infrared im aging from the 2MASS sur- 
vey (|Skrutskie et al.l 120061 ) . The sources are detected 
at a spatial resolution of 18'^ Selection criteria for 
massiv e young stellar objects (MYSO), described in de- 
tail bv ILumsd en et al. (2002), are based on photometric 
properties of known MYSOs. The aim of the survey is 
to assemble a complete catalog of Galactic MYSOs to 
> 10^1/0. T12 estimate the sample to be complete to 
the dista nce of the furthe st bubble in the COG sample 
(^15 kpc; Deharveng et al.i i2010l) . Follow-up work is un- 
der way to determine the nature of the sources in the 



initial catalog (lUrquhart et al.l [201 iL [2QT2f ). Because of 
CDufusion and difficulties in distance determination, the 
r 3gion |/| < 10° is excluded from the catalog. 
_ From the publicly available RMS catalogj, we se- 
kicted those sources collected under the header "all 
yDung sources". This sample of 1573 sources con- 
t lins diffuse H ll regions, (compact) H ll regions, 
YSOs, 'HII/YSOs', and a number of sources classified 
as 'young/old star?'; details of these classifications were 
provided by J. Urquhart (private communication, 2012). 
The 'H II regions' in the catalog are either compact or 
ultra-compact, whereas 'diffuse H ll regions' are typi- 
cally extended with respect to the MSX 18''beam, likely 
tj 3presenting a more evolved phase. The 'HII/YSO' clas- 
s fication signifies sources that display properties of both 
object types, perhaps indicating that the source is in 
a transitional stage, or contains multiple sources in the 
^gam. The 'young/old star?' sources display confiict- 
ing properties, however, most are thought to be evolved 
stars. These form a very small portion of our sample 
(<1%). For simplicity we refer to the sample as 'YSOs', 
however the range of object types this represents in the 
sample may well be of relevance to the interpretation of 
the results. 

After selecting those sources in the overlap region with 
the bubble catalogs (10 < |/| < G5°, \b\ < 1.0°), 1018 
sources remain. Their distribution with galactic lon- 
gitude and latitude is shown in Fig. [1] and [2] respec- 
tively. The type classifications by the RMS team suggest 
the following distribution: ~51% of sources are H ll re- 
gions, -14% diffuse H II regions, -32% YSOs and -2% 
HII/YSOs and -1% young/old stars. 

The RMS catalog contains distances for the majority 
of sources. These were determined either from the liter- 
ature or from observations in NH3, CO/CS, methanol or 
water maser velocity; where the source forms part of a 
larger complex, the distance to the complex was adopted 
(from similar measurements or the literature). Out of 
the 1018 sources in our sample, 74 have no distances in 
the catalog and for a further 58 the kinematic distance 
ambiguity (KDA) is unresolved; for this latter group we 
choose the near distance. The distribution of YSO dis- 
tances is shown in Fig. [5l 

3. CORRELATION ANALYSES FOR CLUSTERING STUDIES 

Angular correlation functions are a commonly used 
tool for the identification of e.g. galaxy clusters in dense 
fields, where they allow for the identification of over den- 
sities in source counts compared with a random distribu- 
tion (e.g. Papovich 2008; Quadri et al. 2008; Hatch e tld] 
1201 1). The angular correlation function w{6) is de- 
fined as the excess probability of finding two objects, 
or two types of objects, separated by a distance 0. This 
stu dy employs the commo nly used Landy-Szalay estima- 
tor (jLandv fc Szalaylll993f ) for calculating the correlation 
function w{0): 



w{0) = 



NpD - 2Ndr + Nrr 
Nrr 



(3) 



where 6 is the separation between objects and N rep- 
resents the pair counts between data points (Njjjj)^ be- 

^ http://www.ast.leeds.ac.uk/RMS/ 



120 



100 



80 



o 



60 



40 



20 





















n- 






[J 
J 


1 


n 






u 


f 






\? 


U 


'^ r, : n 



10 15 
distance (kpc) 



20 



25 



Figure 5. Distribution of distances to the RMS YSOs, for 942 
sources with published distances. For those sources with unre- 
solved KDAs (58 sources) the near distance was selected. 

tween data and random points (Ndr), and between ran- 
dom and random points (Nrr). Equation [3] applies 
to one single set of sources, describing thus the auto- 
correlation or the intrinsic clustering properties of the 
data. The method can be generalised to be applicable to 
two different datasets using: 



w{0) 



Nr^r^ 



(4) 



with the same symbols (jBradshaw et al.l l2Qllf ) , where 
subscripts 1 and 2 indicate the bubble and YSO sam- 
ples respectively. Total pair counts were normalised such 
that: 



(5) 
The random bubble catalog was generated with ran- 
domly distributed longitudes, latitudes and effective 
radii. The latitudes were drawn from the best-fit Gaus- 
sian latitude distribution (shown in Fig. [2]), and the ef- 
fective radii follow the best-fit log-normal distribution, 
to ensure that no artificial over- or underdensities are 
introduced by the randomisation as compared with the 
data. The longitudes were distributed uniformly within 
the coordinate coverage area. Random YSOs were gen- 
erated similarly in longitude-latitude space. The random 
catalogs contained a factor of 50 more objects than the 
corresponding data catalogs, ensuring good sampling of 
the covered area and sufficient number counts in each 
bin. 

Bootstrap resampling, implemented via random sam- 
pling with replacement of t he bubble catalo g, was used 
to estimate sampling errors (jLing et al.lll986f ). 100 boot- 
strap iterations were carried out for the analysis. Uncer- 
tainties are presented at the 1-a level throughout. 

Given that massive stars are kno wn to form alniost ex - 
clusively in clustered environments (jLada fc Ladal[2QQ3f ) . 
a comparison between the bubble- YSO correlation and 
the YSO auto-correlation is important for the interpre- 
tation of the correlation function output. This allows us 



to assess whether any observed clustering signal is phys- 
ically meaningful, or simply a reflection of the underly- 
ing YSO clustering. To this end we perform an auto- 
correlation analysis using Equation [3l 

4. RESULTS 

The positional correlation analysis described above was 
carried out for a number of instances of bubble and YSO 
catalogs, to compare to the findings of T12 and assess the 
physical significance of the correlations observed. The 
code used to perform the analysis was written in Python, 
and is publicly available (see Section [7j). 

A number of diagnostic plots were produced for each 
analysis run, to assess the code performance and provide 
information on sensitivities of the method. Fig. [6] shows 
a comparison of the distributions of data and random 
catalogs for bubbles in longitude, latitude and effective 
radius. Similar plots were produced to check the YSO 
random catalog distributions. 

The second diagnostic plot shown with each analysis is 
a box plot of the total pair counts, prior to normalisation, 
in each bin over the bootstrap iterations (e.g. Fig. [7j). 
The plot shows the median pair counts in each bin of 
(red horizontal line), the boxes span the lower to upper 
quart iles, and the whiskers show the range 1.5 x the inner 
quartile range. Outlier points beyond these values are 
marked with x. These plots inform about the dispersion 
and skew of the pair counts across the bootstraps. 

4.1. YSO auto- correlation 

Important for the interpretation of the correlation 
plots that follow is the auto-correlation of the YSO sam- 
ple, which describes the intrinsic clustering properties of 
these sources. The auto-correlation was calculated us- 
ing Equation [3l using a random catalog size containing 
50 times the input sample size (roughly 50,000 random 
sources). Variances were obtained using the bootstrap- 
ping method, as described above. We performed 100 
bootstrap iterations. 

Following T12, the YSO sample was divided into a 
'bubble- associated' and a 'control' sample. YSO's lo- 
cated within 2 Reff from the nearest bubble are placed 
in the former group, and those > 3Reff from the near- 
est bubbles in the latter. Performing this division using 
both the C06 and MWP bubble catalogs yields striking 
differences: while the 'associated' and 'control' groups of 
YSOs contain 140 and 824 sources (14% and 81%) when 
split using the COG bubbles, these fractions are very dif- 
ferent when using the MWP bubbles. When measured 
against the full sample of MWP bubbles, 67% (678) of 
YSOs lie within 2 radii from a bubble, and the control 
group contains just 227 sources. 

We can examine this more closely by studying the as- 
sociations with the MWP-L and MWP-S bubble samples 
separately. This shows that 644 YSOs lie within 2 radii 
of a MWP-L bubble, the control group containing 251 
YSOs (63% and 25% respectively); when comparing with 
the MWP-S bubbles, only 127 YSOs are 'associated' and 
865 are 'control'. This indicates that the majority of 
bubble-associated young sources are found in close prox- 
imity to large (MWP-L) bubbles, which follows naturally 
from the fact that larger bubbles cover a larger area of 
sky than their smaller counterparts. It is important to 
note that this approach cannot distinguish between those 
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Figure 6. Comparison of distribution with longitude (bottom), 
latitude (middle) and Rgff (top) of the MWP bubbles (black) and 
their corresponding randomised catalog (red). The random cat- 
alog in this case contains 50 times more sources than the data 
(~ 2.2 X 10^ bubbles). The counts are normalised for legibil- 
ity. The longitude randomisation was uniform over the coordinate 
range covered; the latitudes and radii follow the best-fit empirical 
Gaussian and log-normal functions, respectively. 

RMS sources that are associated with bubbles versus co- 
incident with them; particularly for the MWP-S bubbles, 
whose radii are comparably-sized to the MSX beam, this 
is a strong possibility. 

By normalising the number of YSOs associated with 
each bubble by the bubble's area (using the circle traced 
by 2Reff), we find that the mean YSO source density 
towards MWP-L bubbles is 0.06±0.23 sources/arcmin^. 
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Figure 7. Example of a diagnostic box plot showing the abso- 
lute (non-normalised) C06 bubble-RMS YSO pair counts over the 
bootstrap iterations (N=100) for the computation of ^DiD2 ^^d 
^DiR2- The red horizontal lines show the median pair counts in 
each bin of 6, the boxes span the lower to upper quart iles, and 
the whiskers show the range 1.5 x the inner quartile range. Outlier 
points beyond these values are marked with x. Bins are in units of 
Reff, as in the correlation plots. 



and towards MWP-S we find a mean density of 0.10±0.36 
sources/arcmin^. When the calculation was repeated us- 
ing a random catalog of YSOs of the same size, with the 
randomisation performed as described above, the equiv- 
alent mean surface densities towards the bubbles were 
found to be at least a factor 10 lower than calculated 
from the real MYSOs, suggesting the YSO surface den- 
sity enhancement is significant towards all bubbles. The 
variances on the data values are large and the significance 
of the difference between the MWP-L and MWP-S sam- 
ples is hard to deduce. A 2-sample Kolmogorov-Smirnov 
(K-S) test on the distributions returned a p- value of 0.87, 
which does not permit us to reject the hypothesis that 
these two samples were drawn from the same distribu- 
tion. In other words, we cannot state that the different 
mean YSO source densities associated with MWP-L and 
MWP-S bubbles is a statistically significant effect. 

Further insight can be gained from comparing the dis- 
tribution of source types in the 'associated' and 'control' 
YSO samples. Of all diffuse H ll regions in the sam- 
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Figure 8. RMS YSO auto-correlation plots. Left: YSO auto-correlation functions for the full sample (circles), bubble-associated (x) and 
control (+) samples. The dashed line indicates the median Reff of the C06 bubbles, the dotted lines indicate the lower and upper quartile 
limits of the size distribution. Right: As left, with the sample division and sizes based on the MWP bubbles. 



pie, 86% are associated with a bubble; as are 60% of 
H II regions, 65% of HII/YSOs, 56% of YSOs and 57% 
of young/old stars. The majority of these associations 
are with MWP-L bubbles, as the above numbers indi- 
cate. 

Given the lack of distance determinations for the bub- 
ble sample, the physical meaning of these numbers is 
hard to assess. MWP-S bubbles may be more distant 
than their large counterparts, in which case the YSO 
association figures may indicate a completeness limita- 
tion in the YSO sample. The increased surface density 
of YSOs towards MWP-S bubbles may be a result of a 
higher level of coincidence, rather than 'association', be- 
tween RMS sources and bubbles. Alternatively, if MWP- 
S bubbles are younger than MWP-L bubbles, star for- 
mation may not yet have been triggered, or the central 
sources may not have swept away enough cloud mate- 
rial to reveal ongoing star formation. A third possibility 
is that MWP-S bubbles surround relatively nearby stars 
that are simply not luminous enough to ionize a large 
bubble and trigger the formation of new stars. Finally, 
contamination has not yet been studied in detail, and 
some MWP-S bubbles are likely to be unconnected to 
star formation (representing instead e.g. PNe or SNR). 

Using the known distances to 189 MWP bubbles, we 
can compare distances MWP-L (179 bubbles) and MWP- 
S bubbles (10 bubbles) to assess the likelihood that the 
two are drawn from the same distribution using a 2- 
sample K-S test. The test returns a p- value of 0.124, 
i.e. we cannot reject the hypothesis that the samples are 
drawn from the same distance distribution. This leaves 
open the question of the nature of the MWP-S as com- 
pared with the MWP-L bubbles, but does not indicate 
that the MWP-S are likely to be preferentially nearer or 
further than their MWP-L counterparts. 

The auto-correlation was computed for both divisions 
of the YSO sample. Fig. [8] shows the auto-correlation 
of the YSOs based on the C06 bubble sample (left) and 



the equivalent result using the MWP bubble catalog as a 
benchmark for the sample division. The clustering prop- 
erties for the 3 samples are qualitatively very similar, 
and similar to the auto-correlation when benchmarked 
against the C06 bubbles. Strong clustering is seen on 
the smallest scales, with the bubble-associated sample 
showing a stronger correlation signal than the full and 
control samples in the range 0.5-1' (at 7-<j). 

The observed auto-correlation properties of the RMS 
sample suggests that massive young stellar sources are 
preferentially found in close proximity to each other over 
what is expected from a random distribution (taking into 
account typical scale heights); in other words, massive 
stars, as is known, preferentially form in clustered en- 
vironments. Importantly, no clustering of YSOs is ob- 
served on any spatial scales that are related to bubble 
sizes; any over density observed on characteristic bubble 
scales is therefore not an intrinsic clustering property of 
this sample. 

4.2. YSO clustering around COG bubbles 

As a first test the analysis was carried out using the 
C06 bubble catalog, to verify consistency of the imple- 
mentation of our method against that of T12. The result- 
ing correlation function is shown in Fig. [9] (black circles). 
A strong positive correlation is observed out to 1 Reff, 
with a peak around 1 i^eff observed with a correlation 
value of ~3 at 4-cr, confirming the overdensity described 
in T12. The lower absolute value of the correlation and 
the somewhat lower significance of the peak can be as- 
cribed to the different binning used in the plot. Beyond 2 
bubble radii, the correlation is effectively non-existent or 
slightly negative, which would indicate a relative sparsity 
of sources compared with a random distribution. 

The box- and- whisker plots. Fig. [71 show the median, 
spread and outliers in pair counts (prior to normalisa- 
tion) in each bin over the bootstrap iterations for Nd^d^ 
and NdiR^i the instances that contain the bootstrapped 
bubble catalog. As per Section [3l sample 1 represents the 
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Figure 9. Correlation function between the matched sample of 
bubbles from MWP-DRl and C06 (275 bubbles) and RMS YSOs, 
compared with that of the C06 sample. 

bubbles (COG in this case), sample 2 the RMS sources. 
The plot shows a relatively large spread in pair counts 
and a clear skewing of the distribution at small separa- 
tions (bins 1-7), which is a result of the small sample 
sizes used in this analysis. The low number counts also 
explain the relatively large error bars in Fig. [9l 

A potential source of discrepancy between this result 
and that of T12 is the different sample of RMS sources 
used. As described in Section[2l we use the publicly avail- 
able catalog of "all young sources" from the RMS survey. 
This sample contains YSOs as well as diffuse H ll re- 
gions and all evolutionary stages in between. T12 report 
using a sample of only YSOs and UCHII regions, which 
contains fewer sources than the set used here. Given the 
high levels of coincidence between bubbles and diffuse 
H II regions, seemingly confirmed by the comparison of 
bubble-associated and control YSOs, a higher number of 
H II regions in our sample may well increase the overden- 
sity in the YSO counts at the smallest separations. 

4.3. YSO clustering around MWP bubbles 

The analysis was repeated using the MWP bubble 
dataset as described in Section O the YSO catalog was 
identical to that used in the C06-RMS analysis. As be- 
fore, random catalogs were constructed with 50 times the 
number of sources in the input catalog, and 100 boot- 
strap iterations were performed. 

4.3.1. Analysis checks 

A number of quality checks were performed prior to 
the full analysis, to examine potential sensitivities and 
biases of the analysis method to input parameters. 

As a first consistency test the analysis was performed 
with only those MWP bubbles that are also present in the 
C06 catalog. In S12 the catalogs were cross- matched, and 
the 006 bubble ID is hsted in the MWP data catalogs. 
This cross-matching between the 006 and MWP catalogs 
reveals more complex associations than a simple one-to- 
one matching: in some cases a single C06 bubble contains 
multiple smaller MWP bubbles, in others multiple O06 
bubbles are merged into one large MWP bubble. For 
this exercise we use only those MWP bubbles that are 
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Figure 10. Comparison of position of 275 MWP-C06 matched 
bubbles. Outliers are marked with squares. 
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Figure 11. Histogram of the ratio of effective radii of the 
C06/MWP matched bubbles, showing how the sizes differ for 
equivalent bubbles in the two catalogs. The dashed line marks 
the median value of 1.02, the dotted lines mark the limits of the 
lower and upper quartiles at 0.92 and 1.17, respectively. 



associated with one single O06 bubble. In the cases where 
multiple MWP bubbles are associated with the same C06 
bubble, we use only the MWP bubble with the closest 
positional match. Over the relevant coordinate region, 
this yielded 275 MWP bubbles. 

Fig. [To] and [TT] show the difference in coordinates and 
radii of these 275 bubbles. A detailed comparison of 
bubble parameters in the C06 samples and MWP is also 
presented in S12. 

Median difference in bubble center positions is 0.06' in 
both longitude and latitude. For 90% of the 275 bub- 
bles the difference is below 0.30' and 0.28' in longitude 
and latitude, respectively. Four bubbles, marked with 
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squares in Fig. [TOl lie more than 10' from their counter- 
parts in COG. Closer inspection reveals their sizes to be 
discrepant as well, which is likely to be a result of the 
complexities in the associations described above. The 
median size ratio of MWP:C06 parameters is 1.02, or 
2%, with the full range covering ratios of 0.034 to 8.3. 
90% of MWP bubbles have sizes within 55% of the C06 
value. These numbers, plotted in Fig. [11] indicate that 
the MWP catalog size is typically somewhat larger than 
that in the COG catalog for a given bubble. The dif- 
ferences are small compared with the bubble sizes, with 
some outliers. 

The resulting correlation function is shown in Fig. [9l 
with the C06-RMS correlation overplotted for compar- 
ison. The strong peak at 6>=lReff is not present in the 
correlation, however two peaks are present at separations 
of 0.8 and 1.4 Reff with respective significances of 2.5 and 
3.2-<j. The bubbles with large discrepancies between the 
catalogs in their center coordinates are not thought to 
affect this result. Those with discrepant radii are likely 
responsible for the observed differences in the correla- 
tion function, as the bubbles' Reff is essentially what the 
bubble- MYSO separation are measured against. 

The most relevant finding from this test is that the 
low number counts in the C06 and C06/MWP matched 
samples make the resulting correlation function very sen- 
sitive to small sample difference and choice of binning in 
0. Further reducing the numbers by excluding outliers 
does not improve the significance of the results. 

A second question to address is related to the compar- 
ative number counts of the datasets used. In the COG- 
RMS analysis, 315 bubbles and 1018 YSOs are included; 
a YSOibubble ratio of roughly 3:1. When repeating the 
analysis with the full MWP sample of 4434 bubbles, this 
ratio is reduced to 0.2:1. Can any overdensity of YSOs 
near bubbles still be recovered when the catalog contains 
just 1 YSO per 5 bubbles? To investigate this, 4 artifi- 
cial YSO catalogs were produced with an equal number 
of sources as the RMS sample, with respectively 100%, 
50%, 25% and 10% of YSOs placed near the rim of a 
randomly chosen bubble. This was implemented by as- 
signing a radial separation of a random number between 
0.8 and 1.6 x Reff of the bubble and a randomly cho- 
sen angle between and 27r. The remaining 'fake YSOs' 
were assigned coordinates of a random 'real' RMS YSO. 
The correlation analysis was carried out between the full 
MWP bubble sample and each of these fake catalogs, to 
check whether an overdensity can be recovered. 

The result of this test is shown in Fig. [121 For each 
dataset, a positive correlation with high statistical signif- 
icance is observed in the bins covering 6> = 0.8 — 1.6i?eff- 
As the fraction of artificial YSOs in a set decreases, the 
correlation signal at the smallest bubble- YSO separa- 
tions increases, and the peak around 1 i^eff gradually 
weakens. The correlation for the '10% fake YSOs' case is 
very close to that seen for the full dataset (black circles 
in Fig. [13]), as expected, given that this sample contains 
90% of the 'real' RMS sources. With increasing 'fake' 
fraction, the correlation at the smallest separations de- 
creases as our test method essentially moves YSOs from 
bubble centres onto the rim regions. For the extreme case 
where 100% of YSOs are 'fake', all YSOs are placed near 
bubble rims, and the correlation at all other separations 
is roughly zero. 
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Figure 12. Correlations between MWP-DRl bubbles and fake 
YSO catalogs with fractions of YSOs placed deliberately within 
0.8-1.6 Reff from a randomly chosen bubble. Data is shown for fake 
YSO samples containing 100% (black circles), 50% (blue squares), 
25% (green diamonds) and 10% (red triangles) of YSOs on bubble 
rims, indicating the detection limit of the analysis method. 

For the sets containing 25, 50 and 100% of artificially 
placed YSOs, strong peaks are observed in the bins cov- 
ering 6> = 0.8 — 1.6i?eff, indicating that the signal can 
be recovered even for low number counts of YSOs vs. 
bubbles, and a substantial fraction of 'real' YSOs. 

4.3.2. MWP bubbles-RMS MYSO angular correlation 

Following initial quality checks, the analysis was per- 
formed for the full MWP bubble and RMS YSO sets. In 
addition, the correlation was carried out on the MWP-L 
and MWP-S bubble sets individually. For each case, ran- 
dom catalogs were created with 50 times the number of 
sources in the input catalog, and 100 bootstrap iterations 
were performed. The resulting correlations are shown in 
Fig- El The improved statistics on the analysis resulting 
from the larger sample size is reflected in the smaller size 
of the error bars. 

All three correlation functions display a strong cluster- 
ing signal of the YSOs on scales of < li^eff of the bubbles 
in the sample. The correlations of the full sample and the 
large bubbles are almost identical, displaying a decrease 
in correlation from to 1 Reff- The MWP-S correlation 
with YSOs appears higher than for the full and MWP-L 
samples at the smallest separations, consistent with the 
YSO source density calculations described in Section l4m 

The correlation function is markedly different from the 
C06-RMS result presented here and by T12. However, 
given the huge increase in bubble sample size this should 
not be unexpected. 

The strong clustering signal in the YSO auto- 
correlation on small spatial scales, and typical bubble 
sizes, make it hard to interpret the observed correlation. 
The analysis check shown in Fig. [T2l indicates that >10% 
of ~1000 YSOs were required to be placed along bubble 
rims for the signal of the overdensity to become signifi- 
cant over the intrinsic clustering trend of the YSOs. It is 
therefore not sufficient to look simply at the 'associated' 
and 'control' groups in the YSOs, as described in Sec- 
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Figure 13. Angular correlation between MWP bubbles and RMS 
YSOs for the full bubble sample (black circles) and the MWP-L 
and MWP-S bubble sets (red x and blue + respectively). 

tion l4.H we need to determine specifically the number of 
YSOs located near the bubble rims. 

Of the 678 YSOs in the 'associated' sample, 225 are 
located 0.8-1.6 Reff from the center of a MWP bubble. 
87% of these are classified as compact H ll regions or 
MYSOs. The percentage of H ll regions (55%) is some- 
what higher than in the general RMS YSO sample (51%), 
however within the Poisson noise on these number counts 
this is not a significant difference from the overall YSO 
type distribution. The analysis check performed with 
the 'fake' YSO catalog suggests that this proportion of 
YSOs (22% of the full sample) placed near bubble rims 
should be recoverable as an overdensity by the correla- 
tion analysis. It is possible that the positive correlation 
is simply 'drowned' out by the overdensity at the small- 
est separations. In the following section, we examine the 
correlation between subsamples of YSOs and bubbles to 
see if different correlations are observed. 

4.3.3. Bubble and YSO subsamples 

The large number of bubbles in the MWP sample al- 
lows us to explore specific subsamples of bubbles with 
potentially meaningful properties. The different corre- 
lations and YSO source densities for the MWP-S and 
MWP-L samples, and the difference in correlation func- 
tions between the MWP and C06 bubbles, invite a closer 
examination of the role of bubble size in the bubble- YSO 
correlation. 

First, the normalised size distribution functions for the 
two bubble samples are shown in Fig. [TH This clearly 
shows that the C06 bubbles are large compared with the 
full MWP set. The dashed lines in the plot indicate the 
median values of the size distributions; at 1.21', the me- 
dian of the C06 bubbles sizes is twice that of the MWP 
bubbles (0.61^). A two-sample K-S test returned a p- 
value of ~10~^, permitting us to rule out simple sampling 
effects for the observed difference. To investigate the de- 
pendency of the correlation function on bubble size, the 
MWP bubble sample was divided into size bins, con- 
taining the 50% largest bubbles (>0.61'; 2235 bubbles), 
the 25% largest bubbles (>1.26'; 1110 bubbles), and the 
10% largest bubbles (> 2.25'; 448 bubbles). The angu- 
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Figure 14. Normalised distribution functions of bubble sizes for 
the C06 (red) and MWP (blue) samples. Dashed lines indicate 
the median values, at 1.21^and 0.61'for C06 and MWP samples 
respectively. 



lar correlation function was calculated for each of these 
subsamples, the result is shown in Fig. [151 

Interestingly, the clustering signal at the smallest sep- 
aration decreases and a positive correlation around IReff 
emerges as the sample increases in size. The decrease 
in correlation with increasing size mirrors the lower 
YSO sur face density towards the MWP-L bubbles (Sec- 
tion gj]). The correlation for the 10% largest MWP 
bubbles shows a clear overdensity in the 0.8-1 Refr bin. 
While the absolute correlation value is relatively low, this 
point, and that for the 1.2-1.4 Rgff bin, carry the high- 
est statistical significance in the series at 4.4 and 3.4-cr 
respectively; these values are very similar to those seen 
in Fig. E 

To rule out that this is simply related to the lower 
number of sources used in the analysis, we carried out 
the same correlation analysis with a random selection of 
400 MWP bubbles. This was repeated three times, and 
while the scatter of points in the individual instances 
can be large, no statistical overdensities appear. We can 
therefore conclude that the observed overdensity around 
rims of the largest bubbles is real. 

In Section |4T1 the YSO surface density was calculated 
for the MWP-S and MWP-L samples. To investigate the 
overdensity observed along the rim of the bubbles, the 
same calculation was performed for those YSOs lying 
between 0.8 and 1.6 Refr only - the area associated with 
the rim - and compared with the mean surface density 
towards the entire bubble. The mean YSO surface den- 
sity towards the 10% largest bubbles (within 2 Reff) is 
very low at 0.01±0.022 sources/arcmin^ over a 2Reff area, 
consistent with Fig. [151 However for the MWP-S bub- 
bles, just 37% of associated YSOs are found in the region 
around the rim when normalised to the respective area, 
whereas for the 10% largest MWP bubbles this value 
is 53%. In summary, larger bubbles are generally as- 
sociated with fewer YSOs than their small counterparts 
once their larger projected area is taken into account, 
but their associated YSO population is more likely to be 
found in the shells rather than in their interiors. 

Both the 006 and S12 authors identified a strong corre- 
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Figure 15. Angular correlation function for the MWP bubbles 
and RMS YSOs, with the MWP bubbles divided into subsamples 
based on size, as indicated in the plot legend. 

lation between bubble effective radii and effective thick- 
nesses. Given the overdensity of YSOs near the rims of 
the largest of bubbles, we intuitively expect a similar ob- 
servation for the angular correlation between YSOs and 
the thickest MWP bubbles. The correlation analysis was 
similarly calculated for the MWP-L bubbles, and as ex- 
pected the correlation function mirrors that shown in 
Fig.[l5l 

In a final examination of subsamples, we divided the 
YSO sample into the diffuse H ll regions and the other 
source types, and studied the correlation between bub- 
bles and these subsamples separately. As many bub- 
bles are known to enclose evolved H ll regions, studying 
these sources separately may reduce the strong overden- 
sity seen at the smallest YSO-bubbles separations, pos- 
sibly making an overdensity near the bubble rims more 
prominent. This would also support the idea that the dif- 
fuse H II regions are (in part) responsible for the strong 
central correlation between bubbles and RMS sources, 
as they trace the bubble driving sources rather than sec- 
ondary star formation. The result is shown in Fig. [161 
As expected, the correlation at the smallest separations 
is reduced for the sample excluding diffuse H ll regions. 
The diffuse H ll regions show a stronger correlation with 
bubble center positions than the other source types, with 
values reaching zero correlation before IReff, while the 
correlation in the rest of the sample remains positive 
at this point. In the correlation between bubbles and 
the sources without diffuse H ll regions, the O.SReff data 
point is reminiscent of the same point seen for the "10% 
fake YSO" correlation in Fig. [121 hinting at a potential 
shell-associated overdensity, but this is uncertain. The 
result of Fig. [T6l supports the idea that the diffuse H ll re- 
gions in the sample are more likely to be associated with 
the triggering rather than the trigger e(i sources. 

5. DISCUSSION 

We discuss here the physical implications of the find- 
ings presented above. We first describe possible limita- 
tions and biases in our datasets and methods, and dis- 
cuss the implications for the occurrence and prevalence 
of triggered star formation on Galactic scales. Finally, 
we highlight a serendipitous discovery made by studying 
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Figure 16. Angular correlation function of MWP bubbles and 
RMS YSOs, divided into diffuse H ll regions only (black circles) 
and other source types (red x). 

particular instances of bubble- YSO correlations. 

5.1. Limitations of the data and methods 

To place the findings presented here into context, we 
must consider first the limitations of the data and meth- 
ods used in the analysis. The following aspects are of 
particular relevance: completeness; astrometric precision 
and spatial resolution; source type uncertainties. These 
limitations are described here for both RMS and MWP 
catalogs. 

The completeness of the RMS source catalog is seem- 
ingly well understood: T12 estimates the sample to be 
complete for sources with L> lO^L© over the distance 
covered by th e CQ6 bubble sample, w hich is estimated to 
be ^15 kpc (jDeharveng et al.1l2QlQf ). Without distance 
determinations for a significant fraction of the MWP 
bubble sample, however, it is unknown whether the dis- 
tance range covered is similar to that of COG. The 189 
MWP bubbles that have assigned distances do cover a 
similar range, from 2 to 15 kpc approximately. The 
MWP bubble sample itself is limited in completeness 
by the size limitation of the drawing tools, as described 
in Section [21 and some bubbles might have been lost 
through imperfections in the clustering algorithm used 
to reduce the classification data into the catalog. 

Important uncertainties in the results presented here 
stem from the spatial resolution of the RMS sources, 
particularly in relation to the typical bubble sizes. The 
RMS sources are detected at 18'^ in all bands. Given a 
median bubble radius in our full MWP sample of 0.61', 
and a minimum bubble Reff of 0.27^ RMS sources are 
comparably-sized to the smallest bubbles; even median- 
sized bubbles are only a factor of ~3 larger than the RMS 
sources. Thus the uncertainty on a source's placement 
on a bubble rim should be considered to be large, al- 
though random differences are expected to be averaged 
out. This also means we cannot account for those RMS 
sources that are coincident with bubbles as opposed to 
associated with them (e.g. on the rims). 

A second related point in relation to the RMS sources 
is the possible association of diffuse and compact H ll re- 
gions. Compact H ll regions may not necessarily be sep- 
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arate entities from their diffuse counterparts, marking 
for example density peaks of an extended ionized region. 
iMorgan et al.l ()2QQ4l ) showed how bright-rimmed clouds 
surrounding diffuse H II regions can appear as compact 
H II regions. Studies of individual sources should be able 
to distinguish between RMS objects representing trigger- 
ing versus triggered sources (and thus the objects' evo- 
lutionary stage) with relative ease; particularly for those 
regions for which radio data are available. 

Sizes from the MWP-S bubble sample, in addition, 
should be taken with caution: the drawing method for 
these bubbles uses a simple box-drawing tool rather than 
the more precise ellipse-drawing tool used for the MWP- 
L bubbles. Association of the MWP-S bubbles with RMS 
sources in particular are expected to be somewhat more 
uncertain as a result. 

The astrometric precision of the RMS sources is re- 
ported by Lumsden et al.l ([2002) to be 2" ^ however 
these authors found that astrometry of a small but 
non-negligibl e number of so urces was very poor. Simi- 
larly, Mottra m et al.l ()2007[ ) compared RMS astrometry 
with mid-IR higher resolution imaging at 10 /im, and 
found a l-a positional offset of 2" ^ but rising to 5-10^' in 
more complex regions. The positional uncertainty on 
the MWP bubbles was determined from the spread of 
center coordinates in the input classifications, for the 
MWP-L bubbles only (the "dispersion" column in the 
data catalogs). The median ratio of positional dispersion 
to Reff for these bubbles is 0.3. Combining this value 
with a worst-case astrometric precision of 10'^ for the 
RMS sources, the resulting uncertainty on bubble- YSO 
separation could be as high as 0.7 Rgff for the smallest 
bubble, 0.4 Reff for the median bubble and 0.3 Reff for 
the largest bubble in the MWP-L sample. Random er- 
rors in the separation are expected to average out over 
a large sample, however we still expect this to limit the 
significance of our results. 

Finally, the analysis uses the effective radius as an indi- 
cator of the location of the bubble rim. This metric gives 
a convenient approximation for the size of a bubble but 
ignores the bubble's eccentricity, rim thickness, possible 
breaks or blowouts in the shell, and any other complex- 
ities. In the second phase of the Milky Way Project, 
launched in March 2012, images of catalogued bubbles 
will be presented to users for the purpose of gathering 
more precise position and size information for future data 
releases from the project. 

5.2. Interpreting correlation functions in 
three-dimensional space 

The search for overdensities of MYSOs and H ll re- 
gions along bright bubble rims is based on empiri- 
cal eviden ce of star formation o c curring in these re- 
gions (e.g. JPeharveng et al.l I2QQ8I: IZavagno et all 120101 : 
iKang et al.l l2009f ). Bubble morphologies are however 
three-dimensional (3D) and often very complex given 
their dense and turbulent surroundings. Projection 
effects and ellipsoidal morphologies can place driving 
sources of the bubble expansion in apparently off-center 
locations, and the velocity dispersions of MYSOs may 
further complicate the observed bubble- YSO association. 
The difficulty in studying the 3D geometry of bubbles 
was shown by iBeaumont fc W illiams' (2010") , who used 
CO observations to recover a fiattened ring-like rather 



than spheroidal morphology for a sample of 43 COG bub- 
bles. 

A detailed interpretation of the correlation functions 
presented here requires a thorough theoretical investi- 
gation of the expected appearance of bubbles and their 
associated YSOs in the presence of triggering, that takes 
into account 3D morphologies, projection effects in the 
Galactic plane, and realistic YSO velocity dispersions. 
Such models can be used to reconstruct the expected 
shape of the correlation functions. Such a study will 
form the subject of a follow-up paper. 

5.3. On the prevalence of triggered massive star 
formation 

The analysis presented here shows that massive young 
stellar objects and H ll regions show a strong positional 
correlation with the infrared bubbles that populate the 
galactic ISM. The YSO surface density is greatest to- 
wards smaller (thinner) bubbles, consistent with the re- 
cent findings by T12. The correlation at the small- 
est bubble-MYSO separations decreases with increasing 
bubble size. 

The increased correlation between YSOs and bubbles 
at the smallest separations is most likely due to a higher 
level of completeness of the MWP sample, and as a result 
more RMS ultra-compact, compact and diffuse H ll re- 
gions are found to be associated with an infrared bub- 
ble. In addition, many RMS sources located near bub- 
ble centers are likely to trace the bubble driving source 
rather than second-generation star formation. Driving 
sources of large (nearer or more evolved) bubbles ap- 
pear extended in the MSX beam and are therefore less 
likely to be included in the point source catalog. The 
diffuse H ll regions included in the RMS catalog are 
clearly strongly centrally associated with MWP bubbles 
(see Fig. [T6|) . but we note that (ultra-) compact H ll re- 
gions seen in RMS may represent the bright-rimmed 
clouds often associated with their larger diffuse counter- 
parts. 

The positional uncertainties described above and the 
geometric approximation to the bubble rims does not al- 
low for a more precise positional correlation. This is a 
particular problem for the MWP-S bubbles, whose po- 
sitional uncertainties cover a significant fraction of their 
radii. A more sophisticated correlation that makes use of 
the full ellipticity information of the bubbles could refine 
the result, particularly using a future data release from 
MWP, which will include better estimates of uncertain- 
ties on the bubble parameters and improved statistics 
from an increased number of classifications. 

A significant overdensity of MYSOs is observed to- 
wards the rims of the largest bubbles. Numerous ob- 
servers have found evidence of ongoing star formation, 
both low- and high-mass, around bubbles and and shells 
in known star forming regions; many interpret this as 
evidence of triggered or sequential star formation. What 
do our findings tell us about the occurrence or the preva- 
lence of triggered star formation? 

As the bubble radius is a dynamical quantity 
(i.e. changeable over time and strongly environment- 
dependent), it is hard to conceive of a selection effect 
that produces the observed overdensity of YSOs near the 
rims of large bubbles. 

Following criteria set out by lElmegreenI (|2011f ) for the 
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identification of triggered star formation, we conclude 
that the analysis performed here can neither confirm nor 
reject conclusively the presence of triggered star forma- 
tion in the Galaxy. The result is strongly limited by the 
lack of distances for the MWP (and C06) bubbles, which 
do not allow a firm determination of bubbles' physical 
sizes or ages. A firm age determination furthermore is 
non-trivial, as it requires knowledge of the luminosity 
and spectral type of the driving source (s) of the bubble 
expansion, and information on the ambient ISM density. 
We are therefore limited to a qualitative interpretation 
of the results. 

The observed overdensity of RMS sources projected 
towards the rims of the largest bubbles appears con- 
sistent with a mode of trigg ered s t ar for mation. 
iWhitworth et al.l (J1994D and Dale et al.l (j2QQ7a[ ) calcu- 
late timescales for shell fragmentation around expand- 
ing H II regions, possibly leading to collect and collapse- 
driven star formation: for a bubble blown by a single 
late-type O star, expanding into a 100 cm~^ uniform 
neutral medium, shell fragmentation will typically occur 
after ~3 Myr. Typical ages of the RMS sources in our 
sample are of the order of 10^ — 10^ years (the diffuse 
H II regions are more evolved). Although many parame- 
ters come into play, it seems reasonable to assume that 
only the larger of the MWP bubbles are likely to have 
a sufficient age, and are driven by sufficiently energetic 
sources, for a second generation of star formation to have 
occurred. 

T12 conclude that the majority of MYSOs they find 
within 2 radii from a C06 bubble, ~14% of the total sam- 
ple, are likely to be triggered. Overall we find that 
67=b3% of RMS sources are located within a projected 
distance of 2 bubble radii, and 22±2% are found in the 
region of 0.8-1.6 Reff (assuming Poisson statistics). Fol- 
lowing T12's definition of MYSO's 'associated' with bub- 
bles, we could state that 67% of MYSO may have been 
triggered, however given the uncertainties and the results 
with bubble subsamples we posit that the lower figure is 
more accurate. 

As for the general prevalence of triggered star forma- 
tion, our analysis is further limited by the choice of the 
RMS catalog of YSOs. The large beam of the MSX satel- 
lite (18'') means that a significant fraction of the sources 
are likely to contain multiple YSOs, particularly those 
at large distances (18''at 15 kpc corresponds to >1 pc in 
size). In one follow-up st udy using 10 /im ground-based 
imaging, Mo ttram et al.l (2007) found a multiplicity of 
25% in a sample of -350 RMS MYSOs. 

While the RMS catalog may contain contributions 
from low- and intermediate mass young stellar popu- 
lations, this regime is not well covered by the catalog. 
As RDI is thought to preferenti ally prod uce low- and 
intermediate- mass stars (.Lee fc Chenll2007[ ). our method 
may not be sensiti ve to all triggered pop ulations near 
the MWP bubbles. IWhitworth et al.l (jl994[ ) showed that 
gravitational collapse of a dense shell surrounding ex- 
panding shells, thought to lead to collect & collapse star 
formation, produ ces predominantly massive fragments. 
iDale et aTl (|2011f ) however argue that the fragment mass 
function cannot be reliably used to predict the mass 
function of resulting stars or clusters. Our use of the 
RMS catalog therefore gives a slight bias towards detect- 
ing triggered star formation from the collect & collapse 



mechanism but our results cannot convincingly distin- 
guish between modes of triggering. 

This study highlights that, although challenging to 
produce, robust and well-documented catalogs of young 
stellar sources are extremely important for studying star 
formation on a Galactic scale using statistical methods. 

5.4. A distant massive cluster rediscovered using 
bubble-YSO associations 

Of all bubbles associated with RMS YSOs, the median 
number of associated RMS sources is 1. Given the num- 
ber counts in each dataset, this low association figure is 
unsurprising. More than 20 MWP bubbles are however 
associated with at least 5 RMS sources, and our highest- 
association bubble offers an interesting case study. 

The MWP bubble with the highest number of asso- 
ciated RMS sources, shown in Fig. ^7\ is located at 
{l,b) = (-21.6, +0.13). Bubble MWP1G338393+001277 
measures approximately 6' in diameter and has a rela- 
tively high eccentricity of 0.55. It is surrounded by sev- 
eral smaller bubbles in a hierarchy and 14 RMS young 
stellar sources, indicating that star forming activity is 
likely to be ongoing in the region. The bubble can be 
identified as the shell surrounding Mercer 81, the re- 
cently discovered massive stellar cluster at the far end 
of th e galactic Bar near the intersection with the Norma 
arm (Davi es et al.l [2012). Davi es et al.l ()2012 ') place the 
cluster at 11 ±2 kpc, giving the main bubble a physical 
radius of —20 pc. The distance determination is consis- 
tent with the RMS catalog distances, which place 12 of 
the 14 sources projected towards the bubble at 12.9 kpc. 
With an estimated mass of > lO^M©, Mercer 81 is one 
of just 2 known young clusters at the far end of the Bar. 

The environment of Mercer 81 demonstrates the prob- 
lem of morphological complexity discussed in Section ISTTI 
in the context of our analysis very well: the bubble is 
clearly eccentric and opened out to the South. Even 
with the ellipse-drawing tool the bubble shape is only 
approximately matched to the actual morphology. The 
RMS sources clearly trace the bright PAH emission along 
the rim, but this region is not very well described by the 
user-drawn ellipses. 

Given a visual extinction towards the cluster of Ay — 
45 ±15 and the heavy stellar crowding at optical and NIR 
wavelengths, such clusters are extremely challenging to 
detect using traditional search methods. Our serendipi- 
tous recovery of this cluster based purely on associations 
of bubbles and young stellar sources demonstrates the 
potential of the MWP dataset to explore not just massive 
star formation but potentially discover heavily extincted 
clusters to large distance in the Galactic plane. 

The discovery of the massive cluster powering H ll re- 
gion RCW 79 based solely on the presence of an H ll re- 
gion wit h star formation occ urring in its vicinity, re- 
ported in lMartins et al.l ()2010l ). offers another noted ex- 
ample. 

6. CONCLUSION 

We studied the statistics of massive star formation in 
the vicinity of infrared bubbles in the inner Galactic 
plane, recently catalogued by the Milky Way Project, 
a Zooniverse citizen science initiative. Our detailed sta- 
tistical correlation analysis between these bubbles and 
the RMS catalog of MYSOs and H ll regions shows a 
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Figure 17. Bubble MWP1G338393+001277, the MWP bubble with the highest number of associated RMS young stellar sources within 
2 radii (Reff=6 from its center, which was found to enclose the newly discovered massive cluster Mercer 81 at a distance of llzb2 
kpc (Davies et al. 2012). The image was created from 4.5/8.0/24.0/im images from the Spitzer GLIMPSE/MIPSGAL surveys. Additional 
MWP-L bubbles in the field are also shown as ellipses. RMS sources are shown with + (YSOs), squares (H ll regions) and diamonds 
(diffuse H ll regions) 



high level of clustering of MYSOs and H ll regions and 
the bubbles, reflecting the clustered mode of massive star 
formation on the one hand, and indicating a strong as- 
sociation of bubbles with massive star formation activity 
on the other. 

The largest (and thickest) bubbles in the MWP sam- 
ple show a statistically significant overdensity of young 
stellar sources near their rims. A qualitative analysis 
of timescales for bubble expansion and shell fragmenta- 
tion indicates that these large bubbles are likely to be 
old enough to have caused the collapse of their dense 
shells. While our analysis cannot prove or disprove the 
occurrence of triggered star formation in these regions, 
the finding supports a collect & collapse-driven mode of 
triggered star formation on the rim of expanding bubbles. 
We find that 67±3% of the RMS young stellar sources lie 
within 2 radii of a MWP bubble, and 22±2% lie within 
0.8-1.6 radii. 

The most important caveats of this result are related 
to the uncertainty on the bubble- YSO separation, due 
to the astrometric precision and extended sizes of the 
sources in both catalogs, the geometric approximation 
used for the bubble sizes, and the uncertainty in source 
types for the RMS sources. Thus we are unable to dis- 
tinguish between those RMS sources that are coincident 
with bubbles versus those that are potentially triggered 
nearby. The interpretation of the results is limited by 
the lack of MWP bubble distances and the complexity of 
characterising bubble physical sizes and ages. 

Whereas many studies of triggered star formation have 
focused on detection of star formation activity in known 
star forming regions, the analysis presented here shows 
how statistical methods can be used to identify potential 
regions of triggering in an unbiased way. The indepen- 



dent recovery of a recently discovered massive cluster at 
11±2 kpc with Ay ~ 45 ± 15, Mercer 81, by closer in- 
vestigation of a strong bubble-MYSO association alone 
demonstrates the potential of statistical comparisons of 
large-scale multi- wavelength Galactic surveys - not just 
for the study of Galactic-scale star formation, but for the 
discovery of stellar clusters to large distances. 

The advent of data from large-scale Galactic surveys at 
infrared, submillimeter and millimeter wavelengt hs (e.g. 



WISE, Wright et al. 2 Q1QI; ATL ASGAL, SchulleFet^ 
2009; and HiGAL, Moli nari et al . 2010) offer the exciting 
prospect of extending such statistical studies to longer 
wavelengths and larger areas of the Galaxy, allowing for 
a more complete census of Galactic star formation and 
a more comprehensive view on the lifecycle of molecular 
material in the Milky Way. This was recently illustrate d 
using ATLASGAL survey data bv lBeuther et all (|2012f ). 
Finally, it is important to note that the ability, pro- 
vided by the MWP's large catalog, to constrain the cor- 
relation function of bubbles provides a stringent test for 
future models of triggered star formation. Given appro- 
priate assumptions or measurements of the age of a pop- 
ulation of YSOs and young stars a model of triggering 
that includes estimates of efficiency and an expansion 
of the bubbles should predict the form of the observed 
correlation function. 

7. CODE 

The main body of code to perform the correlation 
and auto-correlation analyses presented in this paper was 
written in Python version 2.6, and is available online via 
the journal. In addition, it is freely available for down- 
load as a public Github repository webpagfi. We invite 

^^ https://github.com/skendrew 
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and encourage other authors to download the code, reuse 
or improve it for reproduction of these results or for sim- 
ilar analyses. 
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